Add internal API and omdb command to expunge a sled #5234

jgallagher · 2024-03-08T22:59:41Z

The omdb interface might be overly cautious; feedback welcome. You can try this against omicron-dev run-all; it looks roughly like:

note: using Nexus URL http://[::1]:12221
note: using database URL postgresql://root@[::1]:42285/omicron?sslmode=disable
note: database schema version matches expected (39.0.0)
WARNING: sled b6d65341-167c-41df-9b5c-41cded99c229 IS PRESENT in the most recent inventory collection; are you sure you want to mark it expunged?
WARNING: This operation will PERMANENTLY and IRRECOVABLY mark sled b6d65341-167c-41df-9b5c-41cded99c229 (sim-b6d65341) expunged. To proceed, type the sled's serial number.
sled serial〉sim-b6d65341
expunged sled b6d65341-167c-41df-9b5c-41cded99c229 (previous policy: InService(Provisionable))

This handles the "internal" half of #5134

dev-tools/omdb/src/bin/omdb/nexus.rs

sunshowers

Looks really good overall, thanks for doing it! Just a few comments.

dev-tools/omdb/src/bin/omdb/db.rs

sunshowers · 2024-03-11T19:08:11Z

dev-tools/omdb/src/bin/omdb/db.rs

+        log: &slog::Logger,
+    ) -> anyhow::Result<Arc<DataStore>> {
+        let db_url = self.resolve_pg_url(omdb, log).await?;
+        eprintln!("note: using database URL {}", &db_url);


Can this use the logger?

I don't think it's supposed to, but I'm not positive. Probably a question for @davepacheco; I think the logger is only passed into all the omicron types that need one, and output from omdb itself goes to stdout or stderr.

Interesting. I think one way to handle this would be to have some kind of tag passed in as a kv to indicate that the message is user-facing, and in the log handler do filtering based on that. (You can also show such messages in a different manner, e.g. "warning: <text>" instead of [timestamp WARN] text.)

This is one of very many things that get emitted directly to stdout/stderr. That's basically how omdb works today. There are definitely tradeoffs to doing it that way but I don't think it makes sense to change just this one and rethinking this approach seems a lot broader than this PR.

dev-tools/omdb/src/bin/omdb/nexus.rs

dev-tools/omdb/src/bin/omdb/main.rs

davepacheco · 2024-03-12T21:53:10Z

dev-tools/omdb/src/bin/omdb/nexus.rs

+    // This is an extremely dangerous and irreversible operation. We put a
+    // couple of safeguards in place to ensure this cannot be called without
+    // due consideration:
+    //
+    // 1. We'll require manual input on stdin to confirm the sled to be removed


This seems fine for now, but I kind of expect we're going to want to write automated tests around this sort of thing (maybe not using omdb, though).

Yeah, agreed. @benjaminleonard made some suggestions about how we might design the external API to mimic this kind of confirmation flow, but I did not implement any of that for the internal API. I put these guards on omdb, but (a) we can relax them (or provide flags to skip them) if needed, or (b) tests may be able to hit the internal API directly.

davepacheco · 2024-03-12T21:54:15Z

dev-tools/omdb/src/bin/omdb/nexus.rs

+    let opctx = OpContext::for_tests(log.clone(), datastore.clone());
+    let opctx = &opctx;
+
+    // First, we need to look up the sled so we know its serial number.


Does this mean it won't work for PCs? Or we just won't necessarily be able to do the inventory collection check?

I admit I don't know enough about how PC-based racks are set up to answer this question. I think PCs still have a serial number in the sled table, right? If their sled-agent reports the same serial number, I think both this and the inventory check will work.

I think this should still work on PCs. This is how we are able to add and remove sleds on the a4x2 testbed.

dev-tools/omdb/src/bin/omdb/nexus.rs

davepacheco · 2024-03-12T22:03:01Z

dev-tools/omdb/src/bin/omdb/nexus.rs

+    let (_authz_sled, sled) = LookupPath::new(opctx, &datastore)
+        .sled_id(args.sled_id)
+        .fetch()
+        .await
+        .with_context(|| format!("failed to find sled {}", args.sled_id))?;
+


Is this the only reason we need the database connection? What about having an API instead to fetch sled info? (This is not a big deal either way.)

We use it both for this and fetching the latest inventory, which doesn't have APIs either (but could). Using the database connection in this subcommand seems a little sketchy to me, but I wasn't sure about adding sled info + inventory internal API commands just to support guard rails. I'll keep this as-is for now but it's easy to revisit (or replace if/when we develop those APIs for some other consumers).

jgallagher added 4 commits March 8, 2024 16:59

nexus internal API: add sled-expunge

ee96a64

add omdb nexus sleds expunge <ID>

cd1fa55

cargo fmt

a806902

silence clippy enum variant warnings

45a782a

jgallagher requested review from sunshowers and davepacheco March 8, 2024 22:59

jgallagher commented Mar 11, 2024

View reviewed changes

dev-tools/omdb/src/bin/omdb/nexus.rs Show resolved Hide resolved

Merge branch 'main' into john/expunge-sled-via-omdb

e638020

sunshowers reviewed Mar 11, 2024

View reviewed changes

jgallagher added 3 commits March 11, 2024 16:16

PR feedback

6effe7d

omdb destructive operation check returns a token

27812fd

use collection's interned baseboards instead of checking sps

4142779

sunshowers reviewed Mar 11, 2024

View reviewed changes

dev-tools/omdb/src/bin/omdb/main.rs Show resolved Hide resolved

jgallagher added 2 commits March 12, 2024 10:07

Merge branch 'main' into john/expunge-sled-via-omdb

6397933

attach #[must_use] to DestructiveOperationToken

2475a27

davepacheco approved these changes Mar 12, 2024

View reviewed changes

jgallagher added 2 commits March 13, 2024 10:17

Merge branch 'main' into john/expunge-sled-via-omdb

9ba8d9a

clarify omdb sled expungement user warnings

83e8c48

jgallagher merged commit 4f101be into main Mar 13, 2024
22 checks passed

jgallagher deleted the john/expunge-sled-via-omdb branch March 13, 2024 15:44

jgallagher mentioned this pull request Mar 13, 2024

Define public endpoint for marking sled as expunged #5134

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add internal API and omdb command to expunge a sled #5234

Add internal API and omdb command to expunge a sled #5234

jgallagher commented Mar 8, 2024

sunshowers left a comment

sunshowers Mar 11, 2024

jgallagher Mar 11, 2024

sunshowers Mar 11, 2024 •

edited

Loading

davepacheco Mar 12, 2024

davepacheco Mar 12, 2024

jgallagher Mar 13, 2024

davepacheco Mar 12, 2024

jgallagher Mar 13, 2024

andrewjstone Mar 13, 2024

davepacheco Mar 12, 2024

jgallagher Mar 13, 2024

Add internal API and omdb command to expunge a sled #5234

Add internal API and omdb command to expunge a sled #5234

Conversation

jgallagher commented Mar 8, 2024

sunshowers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sunshowers Mar 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sunshowers Mar 11, 2024 •

edited

Loading